##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_double(),
## player_name = col_character(),
## player_extended_name = col_character(),
## quality = col_character(),
## revision = col_character(),
## origin = col_character(),
## club = col_character(),
## league = col_character(),
## nationality = col_character(),
## position = col_character(),
## date_of_birth = col_date(format = ""),
## added_date = col_date(format = ""),
## pref_foot = col_character(),
## att_workrate = col_character(),
## def_workrate = col_character(),
## traits = col_character(),
## specialities = col_character(),
## pc_last = col_logical(),
## pc_min = col_logical(),
## pc_max = col_logical()
## )
## ℹ Use `spec()` for the full column specifications.
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_double(),
## player_name = col_character(),
## player_extended_name = col_character(),
## quality = col_character(),
## revision = col_character(),
## origin = col_character(),
## club = col_character(),
## league = col_character(),
## nationality = col_character(),
## position = col_character(),
## date_of_birth = col_date(format = ""),
## added_date = col_date(format = ""),
## pref_foot = col_character(),
## att_workrate = col_character(),
## def_workrate = col_character(),
## traits = col_character(),
## specialities = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
##
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_double(),
## player_name = col_character(),
## player_extended_name = col_character(),
## quality = col_character(),
## revision = col_character(),
## origin = col_character(),
## club = col_character(),
## league = col_character(),
## nationality = col_character(),
## position = col_character(),
## date_of_birth = col_date(format = ""),
## added_date = col_date(format = ""),
## pref_foot = col_character(),
## att_workrate = col_character(),
## def_workrate = col_character(),
## traits = col_character(),
## specialities = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_double(),
## player_name = col_character(),
## player_extended_name = col_character(),
## quality = col_character(),
## revision = col_character(),
## origin = col_character(),
## club = col_character(),
## league = col_character(),
## nationality = col_character(),
## position = col_character(),
## date_of_birth = col_character(),
## added_date = col_character(),
## pref_foot = col_character(),
## att_workrate = col_character(),
## def_workrate = col_character(),
## traits = col_character(),
## specialities = col_logical()
## )
## ℹ Use `spec()` for the full column specifications.
The video game FIFA, which is developed by Electronic Arts (EA) Sports, has become the most popular sports video game in the world in recent years, largely due to its game mode Ultimate Team. The objective of Ultimate Team is to build the best team possible through both buying and selling players, as well as buying packs of cards similarly to how people buy soccer trading cards in real life. Each player receives ratings in various categories based on their real life abilities, and each of these ratings factor into their overall rating. At the end of each season, EA Sports creates a Team of the Season (TOTS), where they select the best player at each position in each league from that season based on how they performed in real life. The players who receive TOTS cards also receive a boost to their overall rating to reflect their abilities in real life. Although most of their choices for TOTS are understandable, there are some choices that confuse and sometimes anger fans. Along with this, EA has never explained how they make their choices. Through the use of machine learning methods and predictive modeling, we aim to determine which variables are most important when choosing a player for TOTS, as well as predict the Team of the Season for Europe’s top five leagues based on this season’s statistics.
Materials: We retrieved complete player datasets for FIFA 17, FIFA 18, and FIFA 19 from here. We retrieved real life statistics from the 2016-2017, 2017-2018, and 2018-2019 seasons from fbref.com. We did not use data from the 2019-2020 season because COVID-19 caused each season to prematurely end in March of 2020.
Methods:
Using these data sets we went about predicting team of the season players using a Random Forest machine learning model. OTher models were tested, but we found that this method was the best. This makes many decision trees using the data to predict what players will be in the team of the season based upon the information that we feed into it. It then puts all of those trees together in order to make a decision on whether or not a player should be in the team of the season. We can then apply that model to data that it did not use in deciding how to decide whether or not a player is in the team of the season in order to check how good our model really is.
Revision: Whether the card is “Normal” or “Team of the Season (TOTS)”
Int : Interceptions
TklW : Tackles Won
OG : Own Goals
Pkcon : Penalties Conceded
MP: Matches Played
Min : Minutes
Gls : Goals
Ast: Assists
Non_Pk_G : Non Penalty Goals (Goals from Open Play or Free Kicks)
Pk: Penalty Kicks
Pkatt: Penalty Attempts
CrdY : Yellow Cards
CrdR : Red Cards
G_per90 : Goals per 90 minutes
A_per90 : Assists per 90 minutes
G_plus_A_per90 : Goals plus Assists per 90 minutes
G_minus_pk_per90 : Non Penalty Goals per 90 minutes
Rk : Table Position
GF : Goals For (Goals your team has scored)
GA : Goals Against (Goals your team has conceded)
GD : Goal Difference (GF-GA)
Pts : Team Points for the Season (3 for a win, 1 for a draw, 0 for a loss)
| League | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Premier League | 2.09 | 1.47 | 1.94 | 0.15 | 10.58 | 17.05 | 3.75 | 2.27 | 3.42 | 5.74 | 11.47 |
| La Liga | 2.05 | 1.41 | 1.85 | 0.20 | 10.60 | 16.80 | 3.89 | 2.04 | 3.46 | 5.82 | 10.62 |
| Ligue 1 | 1.95 | 1.30 | 1.75 | 0.21 | 10.49 | 16.94 | 3.67 | 1.95 | 3.17 | 5.77 | 11.09 |
| Bundesliga | 1.97 | 1.39 | 1.82 | 0.16 | 9.64 | 15.07 | 3.42 | 2.09 | 3.07 | 5.13 | 10.02 |
| Serie A | 2.05 | 1.35 | 1.85 | 0.19 | 10.61 | 16.51 | 3.73 | 2.06 | 3.32 | 5.77 | 11.07 |
The Premier League is widely considered the best league in the world. A league full of tradition and history that has seen many dominant teams and outstanding players. In recent history the league has been generally dominated by Manchester City and Liverpool, both of which won league titles by large margins. With the influx of foreign money in the league the talent gap between the top and the bottom of the league has seen steady growth, but those at the bottom continue to make it competitive.
## Warning: Removed 19222 rows containing non-finite values (stat_bin).
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 3.055851 | 2.327128 | 2.781915 | 0.2739362 | 11.069149 | 26.94010 | 3.649960 | 2.360092 | 3.264401 | 5.455811 | 5.377800 |
| Normal | Testing | 3.200000 | 2.128000 | 2.872000 | 0.3280000 | 11.816000 | 27.45058 | 3.632292 | 2.094447 | 3.113384 | 5.492493 | 5.310808 |
| TOTS | Training | 8.942308 | 5.250000 | 8.269231 | 0.6730769 | 3.557692 | 31.76645 | 8.756876 | 4.167451 | 7.819321 | 3.268468 | 3.829581 |
| TOTS | Testing | 10.470588 | 7.352941 | 10.117647 | 0.3529412 | 4.058823 | 29.53987 | 8.768678 | 4.372373 | 8.388104 | 3.230143 | 4.999096 |
## # A tibble: 500 x 24
## position Int TklW OG PKcon Age MP Min Gls Ast Non_PK_G PK
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CDM 72 39 0 0 30 32 2125 4 2 4 0
## 2 LM 22 20 0 0 26 31 2134 5 4 4 1
## 3 LB 30 36 0 0 24 23 1759 0 1 0 0
## 4 RB 28 45 0 0 27 26 2316 4 3 4 0
## 5 LB 89 64 0 1 30 36 3153 0 2 0 0
## 6 RB 69 41 0 1 29 35 2988 0 1 0 0
## 7 ST 10 19 0 0 30 32 2545 9 5 8 1
## 8 CB 28 21 1 1 27 31 2786 1 1 1 0
## 9 CM 7 9 0 0 30 25 1856 2 5 2 0
## 10 CAM 19 15 1 0 24 36 2492 8 5 8 0
## # … with 490 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## # CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## # G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## # Pts <dbl>, revision <fct>
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
##
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
##
## Main Arguments:
## mtry = tune()
## trees = 100
## min_n = tune()
##
## Computational engine: ranger
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## # A tibble: 18 x 8
## mtry min_n .metric .estimator mean n std_err .config
## <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 1 2 accuracy binary 0.921 5 0.0145 Preprocessor1_Model1
## 2 1 2 roc_auc binary 0.948 5 0.00933 Preprocessor1_Model1
## 3 1 21 accuracy binary 0.918 5 0.0148 Preprocessor1_Model2
## 4 1 21 roc_auc binary 0.954 5 0.0170 Preprocessor1_Model2
## 5 1 40 accuracy binary 0.925 5 0.0172 Preprocessor1_Model3
## 6 1 40 roc_auc binary 0.951 5 0.0157 Preprocessor1_Model3
## 7 16 2 accuracy binary 0.935 5 0.0211 Preprocessor1_Model4
## 8 16 2 roc_auc binary 0.926 5 0.0419 Preprocessor1_Model4
## 9 16 21 accuracy binary 0.935 5 0.0151 Preprocessor1_Model5
## 10 16 21 roc_auc binary 0.942 5 0.0295 Preprocessor1_Model5
## 11 16 40 accuracy binary 0.944 5 0.0163 Preprocessor1_Model6
## 12 16 40 roc_auc binary 0.941 5 0.0290 Preprocessor1_Model6
## 13 31 2 accuracy binary 0.923 5 0.0245 Preprocessor1_Model7
## 14 31 2 roc_auc binary 0.925 5 0.0411 Preprocessor1_Model7
## 15 31 21 accuracy binary 0.930 5 0.0220 Preprocessor1_Model8
## 16 31 21 roc_auc binary 0.935 5 0.0301 Preprocessor1_Model8
## 17 31 40 accuracy binary 0.942 5 0.0178 Preprocessor1_Model9
## 18 31 40 roc_auc binary 0.927 5 0.0348 Preprocessor1_Model9
## Preparation of a new explainer is initiated
## -> model label : rf
## -> data : 428 rows 31 cols
## -> target variable : 428 values
## -> predict function : yhat.workflow will be used ( [33m default [39m )
## -> predicted values : No value for predict function target column. ( [33m default [39m )
## -> model_info : package tidymodels , ver. 0.1.2 , task classification ( [33m default [39m )
## -> predicted values : numerical, min = 0 , mean = 0.1639911 , max = 1
## -> residual function : difference between y and yhat ( [33m default [39m )
## -> residuals : numerical, min = -0.9988235 , mean = -0.0424958 , max = 0.7414913
## [32m A new explainer has been created! [39m
## # A tibble: 2 x 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 accuracy binary 0.887 Preprocessor1_Model1
## 2 roc_auc binary 0.860 Preprocessor1_Model1
## Truth
## Prediction Normal TOTS
## Normal 116 7
## TOTS 9 10
## Truth
## Prediction Normal TOTS
## Normal 116 7
## TOTS 9 10
## Player revision position Int TklW OG PKcon Nation
## 1 Eric Dier 17 Normal CDM 37 34 0 0 ENG
## 2 Adam Lallana 17 TOTS CM 20 35 0 0 ENG
## 3 Sadio Mane 17 TOTS RW 11 18 0 1 SEN
## 4 Victor Moses 17 Normal RB 41 42 0 0 NGA
## 5 Paul Pogba 17 Normal CM 37 40 0 1 FRA
## 6 Victor Wanyama 17 Normal CDM 39 64 0 0 KEN
## 7 Philippe Coutinho 17 Normal LW 18 25 0 0 BRA
## 8 Sergio Aguero 18 TOTS ST 8 5 0 0 ARG
## 9 Eric Dier 18 Normal CB 30 35 0 0 ENG
## 10 Abdoulaye Doucoure 18 TOTS CDM 41 41 0 1 FRA
## 11 Andrew Robertson 18 TOTS LB 24 21 0 0 SCO
## 12 Antonio Valencia 18 Normal RB 43 37 0 0 ECU
## 13 Christian Eriksen 19 TOTS CAM 11 27 0 0 DEN
## 14 Harry Kane 19 Normal ST 4 7 0 0 ENG
## 15 James Maddison 19 TOTS CAM 12 34 0 0 ENG
## 16 Callum Wilson 19 Normal ST 1 9 0 0 ENG
## Squad Age Born MP Min minutes_played_divided_by90 Gls Ast
## 1 Tottenham 22 1994 36 3043 33.8 2 1
## 2 Liverpool 28 1988 31 2348 26.1 8 6
## 3 Liverpool 24 1992 27 2235 24.8 13 5
## 4 Chelsea 25 1990 34 2483 27.6 3 2
## 5 Manchester Utd 23 1993 30 2608 29.0 5 4
## 6 Tottenham 25 1991 36 3012 33.5 4 1
## 7 Liverpool 24 1992 31 2227 24.7 13 8
## 8 Manchester City 29 1988 25 1963 21.8 21 6
## 9 Tottenham 23 1994 34 2824 31.4 0 2
## 10 Watford 24 1993 37 3324 36.9 7 3
## 11 Liverpool 23 1994 22 1940 21.6 1 5
## 12 Manchester Utd 31 1985 31 2740 30.4 3 1
## 13 Tottenham 26 1992 35 2774 30.8 8 12
## 14 Tottenham 25 1993 28 2424 26.9 17 4
## 15 Leicester City 21 1996 36 2831 31.5 7 7
## 16 Bournemouth 26 1992 30 2528 28.1 14 9
## Non_PK_G PK PKatt CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1 2 0 0 6 0 0.06 0.03 0.09 0.06
## 2 8 0 0 3 0 0.31 0.23 0.54 0.31
## 3 13 0 0 4 0 0.52 0.20 0.72 0.52
## 4 3 0 0 4 0 0.11 0.07 0.18 0.11
## 5 5 0 0 7 0 0.17 0.14 0.31 0.17
## 6 4 0 0 10 0 0.12 0.03 0.15 0.12
## 7 13 0 0 2 0 0.53 0.32 0.85 0.53
## 8 17 4 4 2 0 0.96 0.28 1.24 0.78
## 9 0 0 0 4 0 0.00 0.06 0.06 0.00
## 10 7 0 0 10 0 0.19 0.08 0.27 0.19
## 11 1 0 0 2 0 0.05 0.23 0.28 0.05
## 12 3 0 0 7 0 0.10 0.03 0.13 0.10
## 13 8 0 0 3 0 0.26 0.39 0.65 0.26
## 14 13 4 4 5 0 0.63 0.15 0.78 0.48
## 15 6 1 2 4 1 0.22 0.22 0.45 0.19
## 16 13 1 2 3 0 0.50 0.32 0.82 0.46
## G_plus_A_minus_PK_per90 Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS
## 1 0.09 2 86 26 60 86 31639 0.0002857143 0.99971429
## 2 0.54 4 78 42 36 76 53016 0.9690373056 0.03096269
## 3 0.72 4 78 42 36 76 53016 0.7454276848 0.25457232
## 4 0.18 1 85 33 52 93 41508 0.0019047619 0.99809524
## 5 0.31 6 54 29 25 69 75290 0.2535569210 0.74644308
## 6 0.15 2 86 26 60 86 31639 0.0187142857 0.98128571
## 7 0.85 4 78 42 36 76 53016 0.2255183441 0.77448166
## 8 1.05 1 106 27 79 100 54070 0.7623181004 0.23768190
## 9 0.06 3 74 36 38 77 67953 0.0850712432 0.91492876
## 10 0.27 14 44 64 -20 41 20231 0.9763224276 0.02367757
## 11 0.28 4 84 38 46 75 53049 0.9166878037 0.08331220
## 12 0.13 2 68 28 40 81 74976 0.0745622120 0.92543779
## 13 0.65 4 67 39 28 71 54216 0.8036522291 0.19634777
## 14 0.63 4 67 39 28 71 54216 0.3609930723 0.63900693
## 15 0.41 9 51 48 3 52 31851 0.9491904589 0.05080954
## 16 0.78 14 56 70 -14 45 10532 0.2149315094 0.78506849
## .pred_class
## 1 TOTS
## 2 Normal
## 3 Normal
## 4 TOTS
## 5 TOTS
## 6 TOTS
## 7 TOTS
## 8 Normal
## 9 TOTS
## 10 Normal
## 11 Normal
## 12 TOTS
## 13 Normal
## 14 TOTS
## 15 Normal
## 16 TOTS
## Warning: Novel levels found in column 'Nation': 'BFA', 'MKD', 'SKN', 'ZIM'. The
## levels have been removed, and values have been coerced to 'NA'.
## Warning: Novel levels found in column 'Nation': 'BFA', 'MKD', 'SKN', 'ZIM'. The
## levels have been removed, and values have been coerced to 'NA'.
| Player | Position | Squad | Minutes Played | Starts | Min | Goals | Assists | Team Rank | Points | Predicted TOTS Probability | Projected Role |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Harry Kane | ST | Tottenham | 30 | 30 | 2632 | 21 | 13 | 7 | 53 | 0.8769657 | Starter |
| Mohamed Salah | RW | Liverpool | 32 | 29 | 2633 | 20 | 3 | 6 | 54 | 0.7440613 | Starter |
| Timo Werner | ST | Chelsea | 31 | 25 | 2243 | 6 | 6 | 4 | 58 | 0.5588417 | Starter |
| Rodri | CDM | Manchester City | 29 | 27 | 2353 | 2 | 1 | 1 | 77 | 0.8912683 | Starter |
| Bruno Fernandes | CAM | Manchester Utd | 33 | 32 | 2821 | 16 | 11 | 2 | 67 | 0.8583900 | Starter |
| Son Heung min | LM | Tottenham | 32 | 31 | 2665 | 15 | 9 | 7 | 53 | 0.7840371 | Starter |
| Harry Maguire | CB | Manchester Utd | 33 | 33 | 2970 | 2 | 1 | 2 | 67 | 0.8360587 | Starter |
| Aaron Wan Bissaka | RB | Manchester Utd | 31 | 31 | 2790 | 2 | 2 | 2 | 67 | 0.8067018 | Starter |
| Ruben Dias | CB | Manchester City | 29 | 29 | 2573 | 1 | 0 | 1 | 77 | 0.7380080 | Starter |
| Luke Shaw | LB | Manchester Utd | 29 | 27 | 2384 | 1 | 5 | 2 | 67 | 0.6883539 | Starter |
| Ollie Watkins | ST | Aston Villa | 32 | 32 | 2880 | 12 | 4 | 11 | 45 | 0.5340481 | Bench |
| Jamie Vardy | ST | Leicester City | 29 | 26 | 2401 | 13 | 8 | 3 | 62 | 0.4363794 | Bench |
| Marcus Rashford | LM | Manchester Utd | 33 | 31 | 2686 | 10 | 8 | 2 | 67 | 0.7781900 | Bench |
| Mason Mount | CAM | Chelsea | 32 | 28 | 2545 | 6 | 4 | 4 | 58 | 0.6269290 | Bench |
| Matt Targett | LB | Aston Villa | 32 | 32 | 2864 | 0 | 1 | 11 | 45 | 0.6016254 | Bench |
Premier League Team of the Season
La Liga has been dominated for many years by Barcelona and Real Madrid, two of the most storied clubs in the world. For the past decade it has been the story of Messi vs Ronaldo, best vs best. These two clubs have won the most Champions League trophies in the last decade and it is rare that one of them does not win the league. Outside of those two clubs the league somewhat struggles for talent, especially defensively, but the gap has seen some closing in the last few years.
## Warning: Removed 18198 rows containing non-finite values (stat_bin).
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.988764 | 2.207865 | 2.705056 | 0.2837079 | 10.772472 | 26.22697 | 3.773874 | 1.999027 | 3.360093 | 5.336711 | 4.584873 |
| Normal | Testing | 2.838983 | 2.347458 | 2.550848 | 0.2881356 | 10.686441 | 26.31591 | 3.215801 | 2.419215 | 2.784565 | 5.743371 | 4.868298 |
| TOTS | Training | 8.958333 | 5.000000 | 7.520833 | 1.4375000 | 4.083333 | 29.57315 | 8.829251 | 3.695886 | 7.795224 | 3.923922 | 3.642296 |
| TOTS | Testing | 11.933333 | 4.733333 | 10.733333 | 1.2000000 | 6.333333 | 30.12296 | 12.831138 | 3.712270 | 11.516861 | 6.488084 | 4.007976 |
## # A tibble: 534 x 23
## position Int TklW PKcon Age MP Min Gls Ast Non_PK_G PK PKatt
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CM 12 23 0 22 31 2480 3 2 3 0 0
## 2 CB 35 32 2 26 25 2023 0 1 0 0 0
## 3 CAM 53 66 0 19 30 2341 2 2 2 0 0
## 4 LM 53 52 1 21 29 2176 1 6 1 0 0
## 5 CM 55 40 0 27 36 3163 7 5 7 0 0
## 6 CB 43 38 0 32 25 1928 1 3 0 1 1
## 7 ST 7 12 0 26 33 2074 13 2 13 0 0
## 8 CDM 16 32 0 26 30 2385 0 5 0 0 0
## 9 LB 29 29 0 26 27 1737 0 2 0 0 1
## 10 CB 31 41 2 27 37 3330 1 0 1 0 0
## # … with 524 more rows, and 11 more variables: CrdY <dbl>, CrdR <dbl>,
## # G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## # G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## # Pts <dbl>, revision <fct>
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
##
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
##
## Main Arguments:
## mtry = tune()
## trees = 100
## min_n = tune()
##
## Computational engine: ranger
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 22...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 22...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 22...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 22...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 22...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 22...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 22...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 22...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 22...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 22...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 22...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 22...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 22...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 22...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 22...
## # A tibble: 18 x 8
## mtry min_n .metric .estimator mean n std_err .config
## <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 1 2 accuracy binary 0.896 5 0.0224 Preprocessor1_Model1
## 2 1 2 roc_auc binary 0.904 5 0.0261 Preprocessor1_Model1
## 3 1 21 accuracy binary 0.901 5 0.0207 Preprocessor1_Model2
## 4 1 21 roc_auc binary 0.913 5 0.0249 Preprocessor1_Model2
## 5 1 40 accuracy binary 0.893 5 0.0220 Preprocessor1_Model3
## 6 1 40 roc_auc binary 0.909 5 0.0237 Preprocessor1_Model3
## 7 16 2 accuracy binary 0.883 5 0.0243 Preprocessor1_Model4
## 8 16 2 roc_auc binary 0.889 5 0.0381 Preprocessor1_Model4
## 9 16 21 accuracy binary 0.881 5 0.0273 Preprocessor1_Model5
## 10 16 21 roc_auc binary 0.905 5 0.0315 Preprocessor1_Model5
## 11 16 40 accuracy binary 0.881 5 0.0240 Preprocessor1_Model6
## 12 16 40 roc_auc binary 0.909 5 0.0297 Preprocessor1_Model6
## 13 31 2 accuracy binary 0.876 5 0.0234 Preprocessor1_Model7
## 14 31 2 roc_auc binary 0.887 5 0.0419 Preprocessor1_Model7
## 15 31 21 accuracy binary 0.881 5 0.0324 Preprocessor1_Model8
## 16 31 21 roc_auc binary 0.902 5 0.0337 Preprocessor1_Model8
## 17 31 40 accuracy binary 0.878 5 0.0289 Preprocessor1_Model9
## 18 31 40 roc_auc binary 0.893 5 0.0402 Preprocessor1_Model9
## Preparation of a new explainer is initiated
## -> model label : rf
## -> data : 404 rows 31 cols
## -> target variable : 404 values
## -> predict function : yhat.workflow will be used ( [33m default [39m )
## -> predicted values : No value for predict function target column. ( [33m default [39m )
## -> model_info : package tidymodels , ver. 0.1.2 , task classification ( [33m default [39m )
## -> predicted values : numerical, min = 0.009889945 , mean = 0.1988047 , max = 0.9157083
## -> residual function : difference between y and yhat ( [33m default [39m )
## -> residuals : numerical, min = -0.6840075 , mean = -0.0799928 , max = 0.7127679
## [32m A new explainer has been created! [39m
## # A tibble: 2 x 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 accuracy binary 0.925 Preprocessor1_Model1
## 2 roc_auc binary 0.868 Preprocessor1_Model1
## Truth
## Prediction Normal TOTS
## Normal 114 6
## TOTS 4 9
## Truth
## Prediction Normal TOTS
## Normal 114 6
## TOTS 4 9
## Player revision position Int TklW OG PKcon Nation
## 1 Sergi Roberto 17 Normal RB 49 44 0 0 ESP
## 2 Kevin Prince Boateng 17 TOTS ST 19 16 0 2 GHA
## 3 Dani Carvajal 17 TOTS RB 45 41 0 0 ESP
## 4 Karim Benzema 18 Normal ST 6 6 0 0 FRA
## 5 Koke 18 Normal CM 23 41 0 0 ESP
## 6 Marcelo 18 Normal LB 26 32 0 0 BRA
## 7 Roberto 18 TOTS RB 0 0 0 0 ESP
## 8 Ever Banega 19 TOTS CDM 31 44 0 1 ARG
## 9 Djene 19 TOTS CB 59 36 0 3 TOG
## 10 Mario Hermoso 19 TOTS CB 25 25 0 2 ESP
## Squad Age Born MP Min minutes_played_divided_by90 Gls Ast
## 1 Barcelona 24 1992 32 2385 26.5 0 6
## 2 Las Palmas 29 1987 28 1978 22.0 10 4
## 3 Real Madrid 24 1992 23 2018 22.4 0 4
## 4 Real Madrid 29 1987 32 2149 23.9 5 9
## 5 Atl\xe9tico Madrid 25 1992 35 2753 30.6 4 3
## 6 Real Madrid 29 1988 28 2262 25.1 2 6
## 7 M\xe1laga 31 1986 34 3060 34.0 0 0
## 8 Sevilla 30 1988 32 2667 29.6 3 5
## 9 Getafe 26 1991 34 2976 33.1 0 0
## 10 Espanyol 23 1995 32 2806 31.2 3 0
## Non_PK_G PK PKatt CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1 0 0 0 5 0 0.00 0.23 0.23 0.00
## 2 10 0 0 11 3 0.46 0.18 0.64 0.46
## 3 0 0 0 11 0 0.00 0.18 0.18 0.00
## 4 3 2 2 0 0 0.21 0.38 0.59 0.13
## 5 4 0 0 3 0 0.13 0.10 0.23 0.13
## 6 2 0 0 3 1 0.08 0.24 0.32 0.08
## 7 0 0 0 0 0 0.00 0.00 0.00 0.00
## 8 1 2 2 17 2 0.10 0.17 0.27 0.03
## 9 0 0 0 13 2 0.00 0.00 0.00 0.00
## 10 3 0 0 7 0 0.10 0.00 0.10 0.10
## G_plus_A_minus_PK_per90 Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS
## 1 0.23 2 116 37 79 90 78034 0.4945020 0.5054980
## 2 0.64 14 53 74 -21 39 20249 0.8094432 0.1905568
## 3 0.18 1 106 41 65 93 69426 0.7078319 0.2921681
## 4 0.50 3 94 44 50 76 66161 0.4543076 0.5456924
## 5 0.23 2 58 22 36 79 55483 0.3289353 0.6710647
## 6 0.32 3 94 44 50 76 66161 0.4382130 0.5617870
## 7 0.00 20 24 61 -37 20 20420 0.8628984 0.1371016
## 8 0.20 6 62 47 15 59 35993 0.7037185 0.2962815
## 9 0.00 5 48 35 13 59 11000 0.7566533 0.2433467
## 10 0.10 7 48 50 -2 53 19388 0.9327282 0.0672718
## .pred_class
## 1 TOTS
## 2 Normal
## 3 Normal
## 4 TOTS
## 5 TOTS
## 6 TOTS
## 7 Normal
## 8 Normal
## 9 Normal
## 10 Normal
## Warning: Novel levels found in column 'Nation': 'ISR', 'MLI', 'NOR'. The levels
## have been removed, and values have been coerced to 'NA'.
## Warning: Novel levels found in column 'Nation': 'ISR', 'MLI', 'NOR'. The levels
## have been removed, and values have been coerced to 'NA'.
La Liga Team of the Season
Generally considered the worst of the top 5 European leagues, Ligue 1 has been completely dominated by PSG for many years. Often called a “farmer’s league” and sometimes not even considered among the best leagues in the world. However, there is no doubt that PSG is one of the best teams in the world. With the likes of Mbappe and Neymar they managed to make it to the Champions League final last season and are in the semi-finals currently.
We began our modeling for Ligue 1 by joining the Ligue 1 datasets from 2017, 2018, and 2019.
We then began with exploratory plots. The first plot showed us how many players were given TOTS cards in the three combined datasets. We are able to see that once again only a small proportion of players are given TOTS cards.
Next, we looked at the density of goals scored between regular players and TOTS players. We were able to see that in general, a larger proportion of TOTS players score a higher number of goals.
Next, we looked at the density of table position by card type. We see that there is an even density of table position for normal cards, while the majority of TOTS players play for better teams.
We then looked at the density of minutes played per match and, unsurprisingly, players who are given TOTS cards tend to play more minutes per contest.
Finally, we looked at the distribution of TOTS cards by position. We are able to see that there is an overwhelming number of strikers and center backs in Ligue 1, and that players who play in the center of the field.
We also evaluated the metrics between the training and testing data to see if there was a significant difference between the two. For Ligue 1, there was not a significant difference in any of the important columns.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.721449 | 2.069638 | 2.428969 | 0.2924791 | 10.944290 | 26.68521 | 3.426052 | 2.017562 | 2.966262 | 5.416942 | 4.902297 |
| Normal | Testing | 3.075630 | 2.025210 | 2.722689 | 0.3529412 | 11.613445 | 27.59384 | 3.751067 | 2.023013 | 3.244093 | 5.611896 | 5.420954 |
| TOTS | Training | 9.041667 | 4.645833 | 7.791667 | 1.2500000 | 3.562500 | 28.32431 | 8.409615 | 3.361165 | 7.070942 | 3.548486 | 4.997419 |
| TOTS | Testing | 9.933333 | 4.600000 | 8.000000 | 1.9333333 | 4.266667 | 30.13407 | 10.278040 | 4.239272 | 8.799351 | 4.333700 | 3.402039 |
## # A tibble: 502 x 24
## position Int TklW OG PKcon Age MP Min Gls Ast Non_PK_G PK
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 RB 39 45 0 0 26 27 2149 0 3 0 0
## 2 LM 11 26 0 0 23 38 2476 3 5 3 0
## 3 RB 51 31 0 0 22 27 2324 0 1 0 0
## 4 ST 14 16 0 0 26 36 2225 10 4 10 0
## 5 CB 41 36 0 0 32 32 2635 1 0 1 0
## 6 RB 72 74 0 3 29 27 2395 0 1 0 0
## 7 RB 67 31 0 1 26 26 2198 1 1 1 0
## 8 CB 27 26 0 1 25 34 3015 1 0 1 0
## 9 RB 74 63 0 0 23 30 2646 0 4 0 0
## 10 LB 24 23 0 3 22 27 2121 0 4 0 0
## # … with 492 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## # CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## # G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## # Pts <dbl>, revision <fct>
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
##
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
##
## Main Arguments:
## mtry = tune()
## trees = 100
## min_n = tune()
##
## Computational engine: ranger
We then examined the accuracy rates of the different models in the different folds. The second model in the first fold is the most accurate at 94.3% accuracy.
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## # A tibble: 18 x 8
## mtry min_n .metric .estimator mean n std_err .config
## <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 1 2 accuracy binary 0.921 5 0.0162 Preprocessor1_Model1
## 2 1 2 roc_auc binary 0.941 5 0.0194 Preprocessor1_Model1
## 3 1 21 accuracy binary 0.917 5 0.0146 Preprocessor1_Model2
## 4 1 21 roc_auc binary 0.943 5 0.0182 Preprocessor1_Model2
## 5 1 40 accuracy binary 0.912 5 0.0156 Preprocessor1_Model3
## 6 1 40 roc_auc binary 0.940 5 0.0166 Preprocessor1_Model3
## 7 16 2 accuracy binary 0.929 5 0.0178 Preprocessor1_Model4
## 8 16 2 roc_auc binary 0.936 5 0.0238 Preprocessor1_Model4
## 9 16 21 accuracy binary 0.911 5 0.0196 Preprocessor1_Model5
## 10 16 21 roc_auc binary 0.925 5 0.0247 Preprocessor1_Model5
## 11 16 40 accuracy binary 0.909 5 0.0148 Preprocessor1_Model6
## 12 16 40 roc_auc binary 0.921 5 0.0283 Preprocessor1_Model6
## 13 31 2 accuracy binary 0.916 5 0.0166 Preprocessor1_Model7
## 14 31 2 roc_auc binary 0.935 5 0.0228 Preprocessor1_Model7
## 15 31 21 accuracy binary 0.902 5 0.0202 Preprocessor1_Model8
## 16 31 21 roc_auc binary 0.926 5 0.0243 Preprocessor1_Model8
## 17 31 40 accuracy binary 0.892 5 0.0193 Preprocessor1_Model9
## 18 31 40 roc_auc binary 0.922 5 0.0264 Preprocessor1_Model9
## Preparation of a new explainer is initiated
## -> model label : rf
## -> data : 407 rows 31 cols
## -> target variable : 407 values
## -> predict function : yhat.workflow will be used ( [33m default [39m )
## -> predicted values : No value for predict function target column. ( [33m default [39m )
## -> model_info : package tidymodels , ver. 0.1.2 , task classification ( [33m default [39m )
## -> predicted values : numerical, min = 0 , mean = 0.1541278 , max = 1
## -> residual function : difference between y and yhat ( [33m default [39m )
## -> residuals : numerical, min = -0.89 , mean = -0.03619165 , max = 0.52
## [32m A new explainer has been created! [39m
In this model, the most important variables are minutes played, goal differential, and goals plus assists per 90 minutes. These three variables contribute to the card classification significantly more than the other variables.
After running the random forest model, our model accuracy comes out to about 86.56%. This is likely due to many players outperforming their card rank, as well as many teams outperforming their projections.
## # A tibble: 2 x 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 accuracy binary 0.866 Preprocessor1_Model1
## 2 roc_auc binary 0.912 Preprocessor1_Model1
Overall, this model predicted that 18 players met our criteria to be selected for team of the season, while also misclassifying 15 players.
## Truth
## Prediction Normal TOTS
## Normal 109 6
## TOTS 10 9
The misclassified players are shown below:
## Player revision position Int TklW OG PKcon Nation Squad
## 1 Lois Diony 17 17 TOTS ST 4 14 0 0 FRA Dijon
## 2 Blaise Matuidi 17 17 Normal CDM 40 42 0 0 FRA Paris S-G
## 3 Adrien Rabiot 17 17 Normal CM 38 46 0 0 FRA Paris S-G
## 4 Djibril Sidibe 17 17 Normal RB 47 52 0 1 FRA Monaco
## 5 Jemerson 17 17 Normal CB 54 51 0 0 BRA Monaco
## 6 Giovani Lo Celso 18 18 Normal CAM 20 59 0 0 ARG Paris S-G
## 7 Alassane Plea 18 18 Normal ST 13 9 0 0 FRA Nice
## 8 Adil Rami 18 18 TOTS CB 33 20 1 0 FRA Marseille
## 9 Dani Alves 18 18 Normal RB 28 52 0 0 BRA Paris S-G
## 10 Radamel Falcao 18 18 TOTS ST 13 7 1 0 COL Monaco
## 11 Joao Moutinho 18 18 Normal CM 39 44 0 0 POR Monaco
## 12 Houssem Aouar 19 19 Normal CM 31 36 0 0 FRA Lyon
## 13 Kenny Lala 19 19 TOTS RB 29 43 0 1 FRA Strasbourg
## 14 Ferland Mendy 19 19 TOTS LB 25 30 0 1 FRA Lyon
## 15 Teji Savanier 19 19 TOTS CDM 44 63 0 0 FRA N\xeemes
## 16 Zeki Celik 19 19 Normal RB 34 55 0 3 TUR Lille
## Age Born MP Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY
## 1 23 1992 35 2807 31.2 11 7 11 0 0 2
## 2 29 1987 34 2415 26.8 4 4 4 0 0 4
## 3 21 1995 27 1935 21.5 3 2 3 0 0 2
## 4 24 1992 29 2321 25.8 2 5 2 0 0 7
## 5 23 1992 34 3058 34.0 2 0 2 0 0 8
## 6 21 1996 33 1776 19.7 4 2 4 0 0 2
## 7 24 1993 35 3041 33.8 16 4 15 1 2 7
## 8 31 1985 33 2955 32.8 1 1 1 0 0 5
## 9 34 1983 25 2065 22.9 1 4 1 0 0 7
## 10 31 1986 26 2128 23.6 18 2 15 3 4 1
## 11 30 1986 33 2802 31.1 1 4 1 0 0 6
## 12 20 1998 37 3061 34.0 7 7 7 0 0 2
## 13 26 1991 34 3060 34.0 5 9 4 1 2 4
## 14 23 1995 30 2531 28.1 2 1 2 0 0 2
## 15 26 1991 32 2864 31.8 6 14 2 4 4 6
## 16 21 1997 34 2971 33.0 1 5 1 0 0 5
## CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1 1 0.35 0.22 0.58 0.35 0.58
## 2 0 0.15 0.15 0.30 0.15 0.30
## 3 0 0.14 0.09 0.23 0.14 0.23
## 4 0 0.08 0.19 0.27 0.08 0.27
## 5 2 0.06 0.00 0.06 0.06 0.06
## 6 0 0.20 0.10 0.30 0.20 0.30
## 7 0 0.47 0.12 0.59 0.44 0.56
## 8 0 0.03 0.03 0.06 0.03 0.06
## 9 1 0.04 0.17 0.22 0.04 0.22
## 10 0 0.76 0.08 0.85 0.63 0.72
## 11 0 0.03 0.13 0.16 0.03 0.16
## 12 0 0.21 0.21 0.41 0.21 0.41
## 13 0 0.15 0.26 0.41 0.12 0.38
## 14 0 0.07 0.04 0.11 0.07 0.11
## 15 1 0.19 0.44 0.63 0.06 0.50
## 16 1 0.03 0.15 0.18 0.03 0.18
## Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 16 46 58 -12 37 10126 0.605 0.395 Normal
## 2 2 83 27 56 87 45160 0.200 0.800 TOTS
## 3 2 83 27 56 87 45160 0.245 0.755 TOTS
## 4 1 107 31 76 95 9586 0.190 0.810 TOTS
## 5 1 107 31 76 95 9586 0.170 0.830 TOTS
## 6 1 108 29 79 93 46929 0.360 0.640 TOTS
## 7 8 53 52 1 54 22876 0.180 0.820 TOTS
## 8 4 80 47 33 77 46040 0.770 0.230 Normal
## 9 1 108 29 79 93 46929 0.110 0.890 TOTS
## 10 2 85 45 40 80 9243 0.810 0.190 Normal
## 11 2 85 45 40 80 9243 0.100 0.900 TOTS
## 12 3 70 47 23 72 49079 0.200 0.800 TOTS
## 13 11 58 48 10 49 25216 0.940 0.060 Normal
## 14 3 70 47 23 72 49079 0.705 0.295 Normal
## 15 9 57 58 -1 53 13994 0.580 0.420 Normal
## 16 2 68 33 35 75 34079 0.380 0.620 TOTS
## Warning: Novel levels found in column 'Nation': 'AUT', 'CAN', 'CHI', 'CRC',
## 'ECU', 'PER', 'SCO', 'ZIM'. The levels have been removed, and values have been
## coerced to 'NA'.
## Warning: Novel levels found in column 'Nation': 'AUT', 'CAN', 'CHI', 'CRC',
## 'ECU', 'PER', 'SCO', 'ZIM'. The levels have been removed, and values have been
## coerced to 'NA'.
Finally, using statistics from the 2020-2021 season, we are able to see that Gaetan Laborde, Kylian Mbappe, Memphis Depay, Kevin Volland, and Wissam Ben Yedder were the top 5 attackers in Ligue 1.
The model also shows that Jonathan Bamba, Idrissa Gana Gueye, Aurelien Tchouameni, Maxence Cqueret, and Ander Herrera are the top 5 midfielders in Ligue 1.
Lastly, the top 5 defenders are shown to be Thomas Delaine, Leo Dubois, Presnel Kimpembe, Damien Da Silva, and Thilo Kehrer.
| Player | Position | Squad | Minutes Played | Min | Goals | Assists | Team Rank | Points | Predicted TOTS Probability | Projected Role |
|---|---|---|---|---|---|---|---|---|---|---|
| Gaetan Laborde | ST | Montpellier | 34 | 2932 | 13 | 8 | 8 | 47 | 0.890 | Starter |
| Memphis Depay | CF | Lyon | 34 | 2653 | 18 | 9 | 4 | 67 | 0.840 | Starter |
| Kylian Mbappe | ST | Paris S-G | 29 | 2214 | 25 | 7 | 2 | 72 | 0.690 | Starter |
| Jonathan Bamba | LM | Lille | 34 | 2719 | 6 | 9 | 1 | 73 | 0.690 | Starter |
| Aurelien Tchouameni | CM | Monaco | 32 | 2703 | 2 | 4 | 3 | 71 | 0.520 | Starter |
| Idrissa Gana Gueye | CDM | Paris S-G | 25 | 1482 | 2 | 1 | 2 | 72 | 0.470 | Starter |
| Thomas Delaine | LB | Metz | 22 | 1600 | 3 | 1 | 10 | 43 | 0.460 | Starter |
| Leonardo Balerdi | CB | Marseille | 17 | 1363 | 2 | 0 | 6 | 55 | 0.445 | Starter |
| Leo Dubois | RB | Lyon | 33 | 2610 | 2 | 3 | 4 | 67 | 0.400 | Starter |
| Presnel Kimpembe | CB | Paris S-G | 25 | 2037 | 0 | 0 | 2 | 72 | 0.360 | Starter |
| Kevin Volland | ST | Monaco | 31 | 2419 | 15 | 7 | 3 | 71 | 0.680 | Bench |
| Wissam Ben Yedder | ST | Monaco | 33 | 2266 | 18 | 5 | 3 | 71 | 0.660 | Bench |
| Ander Herrera | CM | Paris S-G | 27 | 1571 | 1 | 3 | 2 | 72 | 0.460 | Bench |
| Leandro Paredes | CM | Paris S-G | 20 | 1288 | 1 | 2 | 2 | 72 | 0.420 | Bench |
| Senou Coulibaly | CB | Dijon | 19 | 1664 | 2 | 0 | 20 | 18 | 0.355 | Bench |
Ligue 1 Team of the Season
Considered the league of the people due to its rule of forcing every club to be 51% fan owned, the German Bundesliga is considered the second best defensive league behind the Premier League. Bayern Munich have dominated the league for many years, often poaching the best players from other teams in the league.
## Warning: Removed 17990 rows containing non-finite values (stat_bin).
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.753571 | 2.142857 | 2.528571 | 0.2250000 | 9.821429 | 25.32901 | 3.279127 | 2.049815 | 2.971653 | 4.919084 | 3.963559 |
| Normal | Testing | 2.849462 | 2.064516 | 2.634409 | 0.2150538 | 10.075269 | 24.80585 | 3.653308 | 1.904265 | 3.209310 | 4.759970 | 3.341268 |
| TOTS | Training | 8.687500 | 5.437500 | 7.854167 | 0.8333333 | 3.750000 | 27.58588 | 7.754631 | 3.902570 | 6.866476 | 2.935476 | 3.970949 |
| TOTS | Testing | 9.625000 | 6.625000 | 8.062500 | 1.5625000 | 7.062500 | 26.82292 | 7.022583 | 3.827532 | 5.960635 | 4.106397 | 4.332935 |
## # A tibble: 434 x 24
## position Int TklW OG PKcon Age MP Min Gls Ast Non_PK_G PK
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CM 37 51 0 1 28 27 2302 1 3 1 0
## 2 LM 29 10 0 0 30 26 1982 7 4 6 1
## 3 ST 11 20 0 0 23 28 1735 5 1 5 0
## 4 CB 34 41 0 0 28 27 2315 3 2 3 0
## 5 CB 74 39 0 0 19 25 2106 0 0 0 0
## 6 CB 10 30 0 0 24 26 2126 3 0 3 0
## 7 LM 27 17 0 0 26 25 1729 1 3 1 0
## 8 CB 16 21 0 1 21 28 2427 0 2 0 0
## 9 CB 35 23 0 1 32 30 2637 0 0 0 0
## 10 CB 21 12 0 1 22 24 1901 1 0 1 0
## # … with 424 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## # CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## # G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## # Pts <dbl>, revision <fct>
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
##
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
##
## Main Arguments:
## mtry = tune()
## trees = 100
## min_n = tune()
##
## Computational engine: ranger
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## # A tibble: 18 x 8
## mtry min_n .metric .estimator mean n std_err .config
## <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 1 2 accuracy binary 0.884 5 0.0130 Preprocessor1_Model1
## 2 1 2 roc_auc binary 0.907 5 0.0111 Preprocessor1_Model1
## 3 1 21 accuracy binary 0.893 5 0.0133 Preprocessor1_Model2
## 4 1 21 roc_auc binary 0.890 5 0.0160 Preprocessor1_Model2
## 5 1 40 accuracy binary 0.884 5 0.0146 Preprocessor1_Model3
## 6 1 40 roc_auc binary 0.900 5 0.0105 Preprocessor1_Model3
## 7 16 2 accuracy binary 0.869 5 0.0121 Preprocessor1_Model4
## 8 16 2 roc_auc binary 0.867 5 0.0202 Preprocessor1_Model4
## 9 16 21 accuracy binary 0.884 5 0.0146 Preprocessor1_Model5
## 10 16 21 roc_auc binary 0.862 5 0.0172 Preprocessor1_Model5
## 11 16 40 accuracy binary 0.875 5 0.0144 Preprocessor1_Model6
## 12 16 40 roc_auc binary 0.862 5 0.0169 Preprocessor1_Model6
## 13 31 2 accuracy binary 0.863 5 0.0196 Preprocessor1_Model7
## 14 31 2 roc_auc binary 0.864 5 0.0240 Preprocessor1_Model7
## 15 31 21 accuracy binary 0.863 5 0.0212 Preprocessor1_Model8
## 16 31 21 roc_auc binary 0.851 5 0.0140 Preprocessor1_Model8
## 17 31 40 accuracy binary 0.869 5 0.0181 Preprocessor1_Model9
## 18 31 40 roc_auc binary 0.852 5 0.0175 Preprocessor1_Model9
## Preparation of a new explainer is initiated
## -> model label : rf
## -> data : 328 rows 31 cols
## -> target variable : 328 values
## -> predict function : yhat.workflow will be used ( [33m default [39m )
## -> predicted values : No value for predict function target column. ( [33m default [39m )
## -> model_info : package tidymodels , ver. 0.1.2 , task classification ( [33m default [39m )
## -> predicted values : numerical, min = 0.01184855 , mean = 0.235783 , max = 0.9191418
## -> residual function : difference between y and yhat ( [33m default [39m )
## -> residuals : numerical, min = -0.7846212 , mean = -0.08944149 , max = 0.714248
## [32m A new explainer has been created! [39m
## # A tibble: 2 x 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 accuracy binary 0.835 Preprocessor1_Model1
## 2 roc_auc binary 0.862 Preprocessor1_Model1
## Truth
## Prediction Normal TOTS
## Normal 84 9
## TOTS 9 7
## Truth
## Prediction Normal TOTS
## Normal 85 9
## TOTS 8 7
## Truth
## Prediction Normal TOTS
## Normal 85 9
## TOTS 8 7
## Player revision position Int TklW OG PKcon Nation
## 1 Kerem Demirbay 17 Normal CAM 44 33 0 0 GER
## 2 Marco Fabian 17 TOTS CAM 51 31 0 0 MEX
## 3 Vincenzo Grifo 17 TOTS LM 37 31 0 0 ITA
## 4 Sebastian Rudy 17 TOTS CM 99 64 0 0 GER
## 5 Javi Martinez 17 Normal CB 55 37 0 0 ESP
## 6 Julian Brandt 18 Normal LM 15 16 0 0 GER
## 7 Michael Gregoritsch 18 TOTS CAM 12 15 0 0 AUT
## 8 Thorgan Hazard 18 TOTS LM 20 33 0 0 BEL
## 9 Naby Keita 18 TOTS CM 22 33 0 0 GUI
## 10 Andrej Kramaric 18 Normal ST 8 3 0 0 CRO
## 11 Philipp Max 18 TOTS LB 19 31 0 0 GER
## 12 Nils Petersen 18 TOTS ST 15 16 0 0 GER
## 13 Wendell 18 TOTS LB 20 25 0 0 BRA
## 14 Ishak Belfodil 19 Normal ST 2 9 0 0 ALG
## 15 Mats Hummels 19 Normal CB 27 13 0 1 GER
## 16 Andrej Kramaric 19 Normal ST 8 11 0 0 CRO
## 17 Lukasz Piszczek 19 Normal RB 31 30 0 0 POL
## Squad Age Born MP Min minutes_played_divided_by90 Gls Ast Non_PK_G
## 1 Hoffenheim 23 1993 28 2169 24.1 6 8 6
## 2 Eint Frankfurt 27 1989 24 2054 22.8 7 4 6
## 3 Freiburg 23 1993 30 2492 27.7 6 7 5
## 4 Hoffenheim 26 1990 32 2786 31.0 2 6 2
## 5 Bayern Munich 27 1988 25 2131 23.7 1 1 1
## 6 Leverkusen 21 1996 34 2326 25.8 9 3 9
## 7 Augsburg 23 1994 32 2527 28.1 13 3 12
## 8 M'Gladbach 24 1993 34 2939 32.7 10 5 5
## 9 RB Leipzig 22 1995 27 1962 21.8 6 5 6
## 10 Hoffenheim 26 1991 34 2228 24.8 13 6 11
## 11 Augsburg 23 1993 33 2959 32.9 2 12 2
## 12 Freiburg 28 1988 32 2244 24.9 15 1 10
## 13 Leverkusen 24 1993 26 2115 23.5 2 3 0
## 14 Hoffenheim 26 1992 28 1863 20.7 16 3 16
## 15 Bayern Munich 29 1988 21 1775 19.7 1 1 1
## 16 Hoffenheim 27 1991 30 2396 26.6 17 4 12
## 17 Dortmund 33 1985 20 1756 19.5 1 6 1
## PK PKatt CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1 0 0 4 0 0.25 0.33 0.58 0.25
## 2 1 2 10 0 0.31 0.18 0.48 0.26
## 3 1 1 1 0 0.22 0.25 0.47 0.18
## 4 0 0 9 0 0.06 0.19 0.26 0.06
## 5 0 0 5 0 0.04 0.04 0.08 0.04
## 6 0 0 0 0 0.35 0.12 0.46 0.35
## 7 1 1 3 0 0.46 0.11 0.57 0.43
## 8 5 6 1 0 0.31 0.15 0.46 0.15
## 9 0 0 8 2 0.28 0.23 0.50 0.28
## 10 2 2 1 0 0.53 0.24 0.77 0.44
## 11 0 0 5 0 0.06 0.36 0.43 0.06
## 12 5 6 4 1 0.60 0.04 0.64 0.40
## 13 2 3 7 1 0.09 0.13 0.21 0.00
## 14 0 0 3 0 0.77 0.14 0.92 0.77
## 15 0 0 1 0 0.05 0.05 0.10 0.05
## 16 5 6 2 0 0.64 0.15 0.79 0.45
## 17 0 0 3 0 0.05 0.31 0.36 0.05
## G_plus_A_minus_PK_per90 Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS
## 1 0.58 4 64 37 27 62 28155 0.4478120 0.5521880
## 2 0.44 11 36 43 -7 42 49165 0.8021510 0.1978490
## 3 0.43 7 42 60 -18 48 23959 0.8394151 0.1605849
## 4 0.26 4 64 37 27 62 28155 0.5792912 0.4207088
## 5 0.08 1 89 22 67 82 75000 0.2754054 0.7245946
## 6 0.46 5 58 44 14 55 28415 0.4763345 0.5236655
## 7 0.53 12 43 46 -3 41 28238 0.7306608 0.2693392
## 8 0.31 9 47 52 -5 47 50986 0.6338276 0.3661724
## 9 0.50 6 57 53 4 53 39397 0.7875857 0.2124143
## 10 0.69 3 66 48 18 55 28716 0.2964712 0.7035288
## 11 0.43 12 43 46 -3 41 28238 0.5823812 0.4176188
## 12 0.44 15 32 56 -24 36 23894 0.7362705 0.2637295
## 13 0.13 5 58 44 14 55 28415 0.7559639 0.2440361
## 14 0.92 9 70 52 18 51 28456 0.3138931 0.6861069
## 15 0.10 1 88 32 56 78 75000 0.3905305 0.6094695
## 16 0.60 9 70 52 18 51 28456 0.3151089 0.6848911
## 17 0.36 2 81 44 37 76 80841 0.4234112 0.5765888
## .pred_class
## 1 TOTS
## 2 Normal
## 3 Normal
## 4 Normal
## 5 TOTS
## 6 TOTS
## 7 Normal
## 8 Normal
## 9 Normal
## 10 TOTS
## 11 Normal
## 12 Normal
## 13 Normal
## 14 TOTS
## 15 TOTS
## 16 TOTS
## 17 TOTS
## Warning: Novel levels found in column 'Nation': 'ANG', 'ARM', 'BEN', 'BFA',
## 'BUL', 'CAN', 'ECU', 'FRO', 'MKD', 'WAL'. The levels have been removed, and
## values have been coerced to 'NA'.
## Warning: Novel levels found in column 'Nation': 'ANG', 'ARM', 'BEN', 'BFA',
## 'BUL', 'CAN', 'ECU', 'FRO', 'MKD', 'WAL'. The levels have been removed, and
## values have been coerced to 'NA'.
| Player | Position | Squad | Minutes Played | Min | Goals | Assists | Team Rank | Points | Predicted TOTS Probability | Projected Role |
|---|---|---|---|---|---|---|---|---|---|---|
| Wout Weghorst | ST | Wolfsburg | 31 | 2671 | 20 | 7 | 3 | 57 | 0.8606470 | Starter |
| Robert Lewandowski | ST | Bayern Munich | 26 | 2188 | 36 | 6 | 1 | 71 | 0.7757811 | Starter |
| Erling Haaland | ST | Dortmund | 26 | 2227 | 25 | 5 | 5 | 55 | 0.7708077 | Starter |
| Thomas Muller | CAM | Bayern Munich | 29 | 2453 | 10 | 17 | 1 | 71 | 0.7518961 | Starter |
| Joshua Kimmich | CDM | Bayern Munich | 24 | 1924 | 3 | 10 | 1 | 71 | 0.6495306 | Starter |
| Leroy Sane | LM | Bayern Munich | 29 | 1672 | 4 | 9 | 1 | 71 | 0.6300856 | Starter |
| David Alaba | CB | Bayern Munich | 29 | 2454 | 2 | 2 | 1 | 71 | 0.5236168 | Starter |
| Jerome Boateng | CB | Bayern Munich | 26 | 2148 | 1 | 1 | 1 | 71 | 0.5121558 | Starter |
| Willi Orban | CB | RB Leipzig | 26 | 2093 | 4 | 1 | 2 | 64 | 0.4841315 | Starter |
| Ridle Baku | RB | Wolfsburg | 29 | 2409 | 6 | 4 | 3 | 57 | 0.4643203 | Starter |
| Andre Silva | ST | Eint Frankfurt | 29 | 2490 | 25 | 6 | 4 | 56 | 0.7664153 | Bench |
| Sasa Kalajdzic | ST | Stuttgart | 30 | 1874 | 14 | 4 | 10 | 39 | 0.5230555 | Bench |
| Marcel Sabitzer | CM | RB Leipzig | 24 | 1756 | 7 | 2 | 2 | 64 | 0.6218725 | Bench |
| Leon Goretzka | CM | Bayern Munich | 23 | 1695 | 5 | 5 | 1 | 71 | 0.6091551 | Bench |
| Angelino | LB | RB Leipzig | 24 | 2042 | 4 | 4 | 2 | 64 | 0.4439111 | Bench |
Bundesliga Team of the Season
The Serie A has one of the richest histories in Europe, with the likes of AC Milan, Inter Milan, and Juventus all having great success. However, in recent history the league has been completely dominated by Juventus with them winning 9 titles in a row before being stopped this year by Inter.
## Warning: Removed 19578 rows containing non-finite values (stat_bin).
First we made a bar chart to see the number of team of the season players in the Serie A.
Next we made a density plot of goals. Team of the season players tend to score slightly more goals than normal players.
Then we made a density plot of team rank of the team of the season players vs normal players. We can see that the team of the season players finish much higher in the table.
Next we made a distribution plot of how much the team of the season players play vs normal players. As you can see the team of the seaon players tend to play a lot more.
We then made a plot of the positional breakdown of all the players. It seems that the distribution of the players is heavily in center backs, center mids, and strikers.
Next we made a table to compare important stats for the training and testing data.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 3.172702 | 2.350975 | 2.883008 | 0.2896936 | 10.777159 | 27.01501 | 3.655291 | 2.215850 | 3.263599 | 5.538129 | 4.844356 |
| Normal | Testing | 2.873950 | 2.134454 | 2.563025 | 0.3109244 | 10.647059 | 26.81410 | 3.472787 | 2.306675 | 3.219745 | 5.453328 | 4.813499 |
| TOTS | Training | 10.339623 | 4.849057 | 9.301887 | 1.0377358 | 4.301887 | 29.78973 | 8.864240 | 3.307313 | 7.655026 | 3.220039 | 5.059081 |
| TOTS | Testing | 9.411765 | 5.176471 | 8.000000 | 1.4117647 | 3.882353 | 28.83987 | 7.080420 | 4.333522 | 5.623611 | 3.407388 | 5.063365 |
## # A tibble: 502 x 24
## position Int TklW OG PKcon Age MP Min Gls Ast Non_PK_G PK
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CB 51 32 0 2 27 37 3258 2 0 2 0
## 2 RW 33 27 0 0 26 35 2516 12 8 10 2
## 3 LB 15 24 0 0 31 24 1807 3 2 3 0
## 4 CM 40 38 0 0 32 34 2741 0 0 0 0
## 5 CDM 28 23 0 0 21 25 1803 0 1 0 0
## 6 CB 18 20 0 1 22 32 2880 0 1 0 0
## 7 CB 25 26 0 1 31 24 1927 0 0 0 0
## 8 ST 9 15 0 2 28 33 2719 11 0 11 0
## 9 LB 15 24 0 0 31 24 1807 3 2 3 0
## 10 CM 37 52 0 0 32 28 1984 0 4 0 0
## # … with 492 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## # CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## # G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## # Pts <dbl>, revision <fct>
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
##
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
##
## Main Arguments:
## mtry = tune()
## trees = 100
## min_n = tune()
##
## Computational engine: ranger
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## # A tibble: 18 x 8
## mtry min_n .metric .estimator mean n std_err .config
## <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 1 2 accuracy binary 0.905 5 0.0139 Preprocessor1_Model1
## 2 1 2 roc_auc binary 0.896 5 0.0142 Preprocessor1_Model1
## 3 1 21 accuracy binary 0.896 5 0.0135 Preprocessor1_Model2
## 4 1 21 roc_auc binary 0.889 5 0.0188 Preprocessor1_Model2
## 5 1 40 accuracy binary 0.901 5 0.0102 Preprocessor1_Model3
## 6 1 40 roc_auc binary 0.896 5 0.0191 Preprocessor1_Model3
## 7 16 2 accuracy binary 0.874 5 0.0163 Preprocessor1_Model4
## 8 16 2 roc_auc binary 0.878 5 0.0202 Preprocessor1_Model4
## 9 16 21 accuracy binary 0.869 5 0.0101 Preprocessor1_Model5
## 10 16 21 roc_auc binary 0.878 5 0.0199 Preprocessor1_Model5
## 11 16 40 accuracy binary 0.869 5 0.0114 Preprocessor1_Model6
## 12 16 40 roc_auc binary 0.880 5 0.0236 Preprocessor1_Model6
## 13 31 2 accuracy binary 0.867 5 0.0168 Preprocessor1_Model7
## 14 31 2 roc_auc binary 0.867 5 0.0248 Preprocessor1_Model7
## 15 31 21 accuracy binary 0.857 5 0.00866 Preprocessor1_Model8
## 16 31 21 roc_auc binary 0.869 5 0.0251 Preprocessor1_Model8
## 17 31 40 accuracy binary 0.859 5 0.0150 Preprocessor1_Model9
## 18 31 40 roc_auc binary 0.871 5 0.0218 Preprocessor1_Model9
## Preparation of a new explainer is initiated
## -> model label : rf
## -> data : 412 rows 31 cols
## -> target variable : 412 values
## -> predict function : yhat.workflow will be used ( [33m default [39m )
## -> predicted values : No value for predict function target column. ( [33m default [39m )
## -> model_info : package tidymodels , ver. 0.1.2 , task classification ( [33m default [39m )
## -> predicted values : numerical, min = 0.0004166667 , mean = 0.1841267 , max = 0.9732857
## -> residual function : difference between y and yhat ( [33m default [39m )
## -> residuals : numerical, min = -0.6189495 , mean = -0.05548593 , max = 0.7697215
## [32m A new explainer has been created! [39m
Here is a plot of the most important variables in our Serie A model. It seems that “Minutes Played”, “Tackles Won”, and “Assists” seem to be the most important.
## # A tibble: 2 x 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 accuracy binary 0.919 Preprocessor1_Model1
## 2 roc_auc binary 0.944 Preprocessor1_Model1
## Truth
## Prediction Normal TOTS
## Normal 117 9
## TOTS 2 8
Here is a confusion matrix of the predictions and true values for the testing data. As you can see we predicted 9 team of the season players correctly and 10 incorrectly. While this is not great, it seems to be mostly ok because the predicted probabilities are seem to be ordered fairly well.
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
## Truth
## Prediction Normal TOTS
## Normal 118 9
## TOTS 1 8
Here are the players in the testing data that our model predicted wrong. As you can see it is a wide variety of players, some being predicted wrong likely due to position, others due to team performance and others due to personal performance.
## Player revision position Int TklW OG PKcon Nation Squad
## 1 Mattia Caldara 17 TOTS CB 90 36 0 0 ITA Atalanta
## 2 Giorgio Chiellini 18 TOTS CB 28 15 0 0 ITA Juventus
## 3 Federico Chiesa 18 TOTS RW 9 37 0 0 ITA Fiorentina
## 4 Marek Hamsik 18 TOTS CM 11 25 0 0 SVK Napoli
## 5 Fabio Quagliarella 18 TOTS ST 8 9 0 0 ITA Sampdoria
## 6 Emre Can 19 TOTS CM 21 58 1 1 GER Juventus
## 7 Giorgio Chiellini 19 TOTS CB 23 9 0 0 ITA Juventus
## 8 Rodrigo De Paul 19 TOTS CM 36 31 0 1 ARG Udinese
## 9 Mario Mandzukic 19 Normal ST 12 22 0 0 CRO Juventus
## 10 Allan 19 TOTS CM 16 92 0 0 BRA Napoli
## Age Born MP Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY
## 1 22 1994 30 2655 29.5 7 0 7 0 0 4
## 2 32 1984 26 2161 24.0 0 1 0 0 0 2
## 3 19 1997 36 3012 33.5 6 4 6 0 0 7
## 4 30 1987 38 2371 26.3 7 1 7 0 0 2
## 5 34 1983 35 2719 30.2 19 5 12 7 8 4
## 6 24 1994 29 1811 20.1 4 1 3 1 1 7
## 7 33 1984 25 1991 22.1 1 1 1 0 0 3
## 8 24 1994 36 3189 35.4 9 9 6 3 6 7
## 9 32 1986 25 2014 22.4 9 6 9 0 0 4
## 10 27 1991 33 2616 29.1 1 3 1 0 0 10
## CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1 0 0.24 0.00 0.24 0.24 0.24
## 2 0 0.00 0.04 0.04 0.00 0.04
## 3 0 0.18 0.12 0.30 0.18 0.30
## 4 0 0.27 0.04 0.30 0.27 0.30
## 5 0 0.63 0.17 0.79 0.40 0.56
## 6 0 0.20 0.05 0.25 0.15 0.20
## 7 0 0.05 0.05 0.09 0.05 0.09
## 8 0 0.25 0.25 0.51 0.17 0.42
## 9 0 0.40 0.27 0.67 0.40 0.67
## 10 0 0.03 0.10 0.14 0.03 0.14
## Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 4 62 41 21 72 16948 0.6105907 0.3894093 Normal
## 2 1 86 24 62 95 39316 0.7982101 0.2017899 Normal
## 3 8 54 46 8 57 26092 0.6613117 0.3386883 Normal
## 4 2 77 29 48 91 43050 0.5115125 0.4884875 Normal
## 5 10 56 60 -4 54 20156 0.7149446 0.2850554 Normal
## 6 1 70 30 40 90 37799 0.6628868 0.3371132 Normal
## 7 1 70 30 40 90 37799 0.8150101 0.1849899 Normal
## 8 12 39 53 -14 43 20414 0.8250314 0.1749686 Normal
## 9 1 70 30 40 90 37799 0.4550676 0.5449324 TOTS
## 10 2 74 36 38 79 29003 0.7114636 0.2885364 Normal
## Warning: Novel levels found in column 'Nation': 'ARM', 'EQG', 'RUS', 'UKR',
## 'USA', 'WAL'. The levels have been removed, and values have been coerced to
## 'NA'.
## Warning: Novel levels found in column 'Nation': 'ARM', 'EQG', 'RUS', 'UKR',
## 'USA', 'WAL'. The levels have been removed, and values have been coerced to
## 'NA'.
Here are the predicted team of the season players for the Serie A this year:
| Player | Position | Squad | Minutes Played | Min | Goals | Assists | Team Rank | Points | Predicted TOTS Probability | Projected Role |
|---|---|---|---|---|---|---|---|---|---|---|
| Romelu Lukaku | ST | Inter | 32 | 2580 | 21 | 9 | 1 | 79 | 0.8603464 | Starter |
| Cristiano Ronaldo | ST | Juventus | 29 | 2463 | 25 | 2 | 3 | 66 | 0.7593194 | Starter |
| Lautaro Martinez | ST | Inter | 33 | 2238 | 15 | 5 | 1 | 79 | 0.6124925 | Starter |
| Matteo Politano | RM | Napoli | 32 | 1696 | 9 | 4 | 4 | 66 | 0.5310431 | Starter |
| Robin Gosens | LM | Atalanta | 27 | 2143 | 8 | 6 | 2 | 68 | 0.5165472 | Starter |
| Piotr Zielinski | CM | Napoli | 31 | 2154 | 6 | 8 | 4 | 66 | 0.5011879 | Starter |
| Cristian Romero | CB | Atalanta | 26 | 2095 | 2 | 2 | 2 | 68 | 0.4132391 | Starter |
| Juan Cuadrado | RB | Juventus | 25 | 1812 | 0 | 10 | 3 | 66 | 0.3629575 | Starter |
| Milan Skriniar | CB | Inter | 29 | 2507 | 3 | 0 | 1 | 79 | 0.3309675 | Starter |
| Rafael Toloi | CB | Atalanta | 28 | 2283 | 2 | 0 | 2 | 68 | 0.2898443 | Starter |
| Duvan Zapata | ST | Atalanta | 32 | 2052 | 14 | 7 | 2 | 68 | 0.6053750 | Bench |
| Alvaro Morata | ST | Juventus | 28 | 1788 | 9 | 9 | 3 | 66 | 0.5608440 | Bench |
| Nicolo Barella | CM | Inter | 32 | 2596 | 3 | 5 | 1 | 79 | 0.4919275 | Bench |
| Ruslan Malinovskyi | CM | Atalanta | 31 | 1525 | 6 | 9 | 2 | 68 | 0.4826799 | Bench |
| Jose Luis Palomino | CB | Atalanta | 31 | 2217 | 1 | 2 | 2 | 68 | 0.2563331 | Bench |
Serie A Team of the Season
Here we show how Kevin De Bruyne would be modeled in all the different leagues had he played in them in order to demonstrate the similarities and differences between the models.
In all of the leagues he preforms fairly well, but we can see that some of the models have assists as a more important stat thus making him do better. And some of the leagues place more negative weight on the fact that he has played slightly less this season, etc.
In conclusion, we found that this is something that is very hard to predict. Our models in no way predicting the binary of TOTS or not properly, but they did seem to order the predicted probabilities fairly well. The best stats that our models seemed to use was how well the player’s team is doing and how much the player is playing. Obviously they used other stats fairly effectively as well, but they struggled to predict players that played well on worse teams. Thus these models likely couldn’t be used for much other than proving that much of what EA Sports does is subjective in terms of picking who gets these cards. Making these models confirmed our suspicion that they have no method to their madness. One interesting implication of this could be how getting or not getting one of these cards affects the public’s perception of the player. Are there players that should be more highly rated by soccer fans, but they didn’t get a team of the season so they aren’t (and vice versa).